🎨 Markdown formatting improvements #2245

rowanc1 · 2025-08-12T04:26:02Z

This improves the markdown formatting output.

merge 🐛 mdast lists need to be in paragraphs #2241
Have a path forward on how to store parsed directive/option/body data on the nodes
Fix parser tests

changeset-bot · 2025-08-12T04:26:05Z

🦋 Changeset detected

Latest commit: dd786b9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 6 packages

Name	Type
myst-parser	Patch
myst-to-md	Patch
myst-roles	Patch
myst-directives	Patch
myst-to-html	Patch
myst-transforms	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

rowanc1

Highlighting changes below. We need to get the parsing data onto the mystDirective/Role nodes, so that we have an easier time in formatting. Would love some thoughts on direction here.

rowanc1 · 2025-08-12T14:39:33Z

packages/myst-ext-card/tests/card.spec.ts

      directives: [cardDirective],
    });
-    expect(output).toEqual(expected);
+    expect(output).toMatchObject(expected);


Tests in the extension code are made a bit less strict here. No changes to the code, so no need to bump the packages.

rowanc1 · 2025-08-12T14:45:58Z

packages/myst-parser/src/directives.ts

        if (argSpec.required && data.arg == null) {
          validationError = true;
        }
+        node.args = data.arg;


This is the biggest change is that we put the arg, options and body of roles and directives onto the mystDirective node. This allows us to process them afterward in a formatting package - or transform a figure into a directive etc. rather than doing that in the markdown writer.

This shouldn't matter for the duplication as we remove these nodes early in the processing for most things. But it is really helpful to have them for the formatting / writing of markdown.

We might want to change it to node.argsParsed or something so we have the pre/post parsing? Or put it on the data field.

@fwkoch or @agoose77 if you have opinions here, that would help.

Ok - seeing this, my first impression was "we already stash this stuff on the mystDirective nodes, don't we...?" - That's not entirely true.

On the initial parse pass, we do save args and options to the mystDirective node; these are the raw, unparsed values. I don't think these fields are used anywhere. Additionally, we create mystDirectiveArg/Option/Body nodes, where the raw value is saved to be parsed, in this applyDirectives function. Within this function we are building a data object of type DirectiveData which contains all the parsed arg/option/body content. Critically, this data object is transient; it's just passed directly to the directive plugins.

So, the change in this PR is taking the parsed, previously transient arg/option/body values and saving them directly to the mystDirective node. I agree this is a good change.

Getting back to your specific question @rowanc1 - are we putting this in the right place? The biggest issue is that we are clobbering raw arg/option values with parsed arg/option values. I think this is fine; we really should only ever be dealing with the parsed values. The only argument I can think of for keeping the raw values is for round-tripping documents exactly as written. But that's not useful at all - MyST does not care about this (I don't think!), we just don't want to lose any data, and for that, parsed structured data is better. (This is especially true if we are thinking about converting between formats - in that case, any raw value is totally useless.)

I guess the other argument against clobbering the raw values with parsed values is simply backwards compatibility. This is a backwards incompatible change that I don't think will impact anyone, but there could be something somewhere using these raw mystDirective fields. (Maybe we make this a minor version bump?)

Regarding the alternatives you suggest:

I don't think we should do argsParsed... that feels excessive; we are keeping track if node.processed === false, so if that's not the case (i.e. the node is processed), args will be parsed. It's also a little silly with bodyParsed since we are not saving the raw body...

Stashing args/options/body on node.data would be fine - this is just taking the DirectiveData object and saving it as "directive data." I somewhat like that, but at the same time, the data field is usually used for metadata - in the case of mystDirective nodes, args/options/body are the fundamental content, not metadata. 🤷

So maybe it's just fine as-implemented? (We could potentially even stop creating the data object, and pass the node as DirectiveData to the run function... But I think for now this is overthinking it...)

TL;DR: 👍 to your change, maybe minor version bump.

fwkoch

Ok - I responded to your question @rowanc1 - but after getting to understand this PR a little more, I think we need slight clarity on the direction:

The tests for these improvements, especially around directive/role handling, are focused on a narrow scoped pipeline that only includes myst-parser and myst-to-md. It ignores all the transforms and additional processing included in the full myst-cli pipeline.

For me to feel comfortable with these changes we need to do one of three things:

Clarify the scope of this PR is isolated to standalone usage of myst-parser and myst-to-md; we should not expect to see the same improvements when using the full myst-cli pipeline.
Modify the myst-cli pipeline for md export to perform fewer transforms (including no lifting of mystDirective nodes, no propagating mystTargets, etc).
Refactor the changes here so they provide the same improvements to post-processed myst-cli mdast as they do to unprocessed myst-parser mdast.

Happy to discuss all this with you more!

fwkoch · 2025-08-14T06:56:25Z

packages/myst-parser/src/directives.ts

        if (argSpec.required && data.arg == null) {
          validationError = true;
        }
+        node.args = data.arg;


Ok - seeing this, my first impression was "we already stash this stuff on the mystDirective nodes, don't we...?" - That's not entirely true.

On the initial parse pass, we do save args and options to the mystDirective node; these are the raw, unparsed values. I don't think these fields are used anywhere. Additionally, we create mystDirectiveArg/Option/Body nodes, where the raw value is saved to be parsed, in this applyDirectives function. Within this function we are building a data object of type DirectiveData which contains all the parsed arg/option/body content. Critically, this data object is transient; it's just passed directly to the directive plugins.

So, the change in this PR is taking the parsed, previously transient arg/option/body values and saving them directly to the mystDirective node. I agree this is a good change.

Getting back to your specific question @rowanc1 - are we putting this in the right place? The biggest issue is that we are clobbering raw arg/option values with parsed arg/option values. I think this is fine; we really should only ever be dealing with the parsed values. The only argument I can think of for keeping the raw values is for round-tripping documents exactly as written. But that's not useful at all - MyST does not care about this (I don't think!), we just don't want to lose any data, and for that, parsed structured data is better. (This is especially true if we are thinking about converting between formats - in that case, any raw value is totally useless.)

I guess the other argument against clobbering the raw values with parsed values is simply backwards compatibility. This is a backwards incompatible change that I don't think will impact anyone, but there could be something somewhere using these raw mystDirective fields. (Maybe we make this a minor version bump?)

Regarding the alternatives you suggest:

I don't think we should do argsParsed... that feels excessive; we are keeping track if node.processed === false, so if that's not the case (i.e. the node is processed), args will be parsed. It's also a little silly with bodyParsed since we are not saving the raw body...

Stashing args/options/body on node.data would be fine - this is just taking the DirectiveData object and saving it as "directive data." I somewhat like that, but at the same time, the data field is usually used for metadata - in the case of mystDirective nodes, args/options/body are the fundamental content, not metadata. 🤷

So maybe it's just fine as-implemented? (We could potentially even stop creating the data object, and pass the node as DirectiveData to the run function... But I think for now this is overthinking it...)

TL;DR: 👍 to your change, maybe minor version bump.

fwkoch · 2025-08-14T07:02:10Z

packages/myst-to-md/src/directives.ts

+          .join(' ')
+      : '';
+    const optOther = Object.entries(otherOptions)
+      .map(([key, value]) => `${key}=${value}`)


Writing all options this way feels risky. Should we check if value has spaces?

fwkoch · 2025-08-14T07:08:50Z

packages/myst-to-md/src/directives.ts

+    const marker = markerChar.repeat(markerLength);
+    const directiveInline = [node.name, directiveOpts].filter(Boolean).join(' ');
+    const args = Array.isArray(node.args)
+      ? state.containerPhrasing({ type: 'heading', depth: 1, children: node.args }, info)


I don't get this - Are we injecting this heading just to prepend #...? I don't think we always want to do that? But I might just be completely misunderstanding!

fwkoch · 2025-08-14T07:13:48Z

packages/myst-to-md/tests/mdast/directives.yml

-             ````{abc}
-            ````
-            print("hello world")
+          value: Ax=b


Is there a reason this value was simplified? I guess maybe backticks no longer matter so much in math since we are using $$ instead of math directive.

But what about the equivalent issue - what if there is a $$ inside the math value...? Are we handling that correctly?

fwkoch · 2025-08-14T07:16:47Z

packages/myst-to-md/tests/mdast/directives.yml

+            :class: my-class
+            :name: my-note


I don't think this test is quite correct. By the time we have a mystDirective node, I think these options should already be stripped off into a separate field. (Now the test is just treating these as part of the body)

fwkoch · 2025-08-14T07:43:19Z

packages/myst-to-md/src/roles.ts

+    return writeStaticRole('math')(node, _, state);
+  }
+  const tracker = state.createTracker(info);
+  return tracker.move(`$${node.value}$`);


Why does this have two $ at the beginning?

fwkoch · 2025-08-14T07:50:15Z

packages/myst-to-md/src/directives.ts

 */
-function mystDirective(node: any, _: Parent, state: NestedState, info: Info): string {
-  return writeStaticDirective(node.name, { argsKey: 'args' })(node, _, state, info);
+function writeDirective(options?: DirectiveOptions) {


Ok, I'm feeling a little confused about this.

First, I don't think we need this wrapper "factory function" - options is unused. We can just define the function directly.

Second, we are only using writeDirective when we encounter mystDirective nodes. These only persist in the tree when they have an unknown directive type. In that case, in the applyDirectives function, we are deleting the children and writing the options back to the body. It never even gets to the new code where node.args/node.options/node.body are set. For known directives, the mystDirective nodes get lifted out.

fwkoch · 2025-08-14T07:51:45Z

packages/myst-to-md/tests/roundtrip/roundtrip.yml

+      (my-label)=
+      # Heading
+
+      :::{admonition #my-label .dropdown} Admonition title


I'm extra confused why this works - it's a known directive type but it's using the new writeDirective code, which only applies to mystDirective nodes, not admonition nodes.

fwkoch · 2025-08-14T07:54:19Z

packages/myst-to-md/tests/roundtrip/roundtrip.spec.ts

+        const mdast = mystParse(markdown, { mdast: { hoistSingleImagesOutofParagraphs: false } });
+        const file = unified()
+          .use(mystToMd)


Ohhhh... now I understand. This roundtrip test is checking the myst-parse -> myst-to-md roundtrip. It is not using any of the myst-cli pipeline. Therefore, it's skipping all the basic transforms which are just applied by default in myst-cli.

If we were to try to round-trip via myst-cli, I don't think we'd see the same results.

rowanc1 added the enhancement New feature or request label Aug 12, 2025

rowanc1 commented Aug 12, 2025

View reviewed changes

Base automatically changed from mdast/lists to main August 13, 2025 14:01

ℹ Improve formatting of myst-to-md

dd786b9

rowanc1 force-pushed the md-improvements branch from 1599ced to dd786b9 Compare August 13, 2025 14:02

fwkoch reviewed Aug 14, 2025

View reviewed changes

jupyter-book-pr-triage-bot bot added this to PR triage (experimental) Nov 29, 2025

🎨 Markdown formatting improvements #2245

Are you sure you want to change the base?

🎨 Markdown formatting improvements #2245

Conversation

rowanc1 commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

rowanc1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fwkoch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rowanc1 commented Aug 12, 2025 •

edited

Loading

changeset-bot bot commented Aug 12, 2025 •

edited

Loading